-
Notifications
You must be signed in to change notification settings - Fork 109
Matrix Transpose Tutorial Cleanup #1917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
I think we also need to clean up the description of the problem to highlight the transpose of the indices as well as the CPU implementations to make it consistent with the gpu versions. |
|
Thanks for getting this started @MrBurmark. I took a follow up pass; but I think I could use a second set of eyes on the Kernel implementation. Would anyone in @LLNL/raja-core be able to take a look? |
| RAJA::loop_icount<hip_threads_x>(ctx, row_tile, [&] (int col, int tx) { | ||
|
|
||
| d_Atview(col, row) = Tile_Array[ty][tx]; | ||
| d_Atview(row, col) = Tile_Array[tx][ty]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MrBurmark , I switched it around so its clear that the x and y threads have been transposed in shared memory. I'm not too sure how to express that in Kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be more clear to call row and col here rowt and colt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea!
… bugfix/burmark1/transpose
Summary
Fix the cuda and hip matrix tutorial. Fix spacing, add proper synchronization, map threads properly in teams implementation.